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f~>) ■ We discuss the physics of RNA as described by its secondary structure. We examine the static 

' properties of a homogeneous RNA-model that includes pairing and base stacking energies as well as 

entropic costs for internal loops. For large enough costs the model exhibits a thermal denaturation 
transition which we analyze in terms of the radius of gyration. We point out an inconsistency 

Qin the standard approach to RNA secondary structure prediction for large molecules. Under an 
external force a second order phase transition between a globular and an extended phase takes 
. place. A Harris-type criterion shows that sequence disorder does not affect the correlation length 

' exponent while the other critical exponents are modified in the glass phase. However, at high 

temperatures, on a coarse-grained level, disordered RNA is well described by a homogeneous model. 
The characteristics of force-extension curves are discussed as a function of the energy parameters. 
' We show that the force transition is always second order. A re-entrance phenomenon relevant for 

^ | real disordered RNA is predicted. 

PACS numbers: PACS numbers: 87.14.Gg, 87.15.-v, 64.60.-i 
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I. INTRODUCTION 
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In this paper we discuss the equilibrium statistical mechanics of RNA or single-stranded DNA as described by 
their secondary structure (base pairing pattern). We mainly concentrate on homogeneous polymers with uniform 
' interactions between monomers which will be shown to capture well the physics of random disordered sequences at 
sufficiently high temperatures on a coarse-grained level. 

We generalize the results of de Gennes' pioneering paper (l) on (homogeneous) periodic dAT-polymers (sequences 
AT AT...) by including entropic penalties for internal loops in order to account to some extent for self-avoidance 
effects. In the case where the latter are large, we predict a thermal denaturation transition that manifests itself in 
the scaling behavior of the radius of gyration. The scaling found in the low temperature phase is smaller than N 1 ^ 3 
where N is the number of monomers of the polymer. This signals an inconsistency for large N since the monomer 
density in three-dimensional space becomes increasingly large with N. Excluded volume effects can therefore not be 
CN [ neglected in the secondary structure prediction of large molecules. 

Recently diverse micromanipulation techniques have been developed that allow to monitor the response of single 
biomolecules, RNA or ssDNA in particular [§, [}[ f|, to an external force. These experiments have raised considerable 
interest in the theoretical study of force-extension characteristics of biomolecules. Within our model force-extension 
curves can easily be obtained upon coupling an external force to the extremities of the polymer. The molecule 
undergoes a thermodynamic phase transition of second order that separates the globular collapsed state from an 
extensive phase containing a large number of small globules [^). We characterize the associated critical behavior 
and study in how far it is modified by the introduction of sequence randomness. Using a Harris-type criterion we 
I ■ argue that the correlation length remains unaffected by the disorder irrespective of temperature while other critical 
exponents may be modified. Numerical results indicate that at higher temperatures disordered models belong to the 
same universality class as the homogeneous model, while at low temperatures the collapsed phase becomes glassy and 
the critical behavior changes. 

In a recent article |^| the authors claim that the introduction of base stacking energies instead of base pairing 
energies may change the order of the force-induced phase transition. However, they were mislead by the appearance 
of a sharp first order-like crossover that occurs at a higher force than the continuous phase transition described above. 
While the latter is almost entirely of entropic nature, reflecting the large space of energetically equivalent secondary 
structures, the crossover is governed by the competition between the pairing energy and the energy gained from 
opening and stretching it. The sharpness of the crossover results from the cooperativity due to the base stacking 
energy that favors long helices of stacked pairs. 



A. The Secondary structure of RNA 



RNA is a linear polymer made up of four types of nucleotides, A, C, G and U. In single-stranded DNA, U is replaced 
by T. In solution with a sufficiently high ionic concentration to screen the charge of the phosphate backbone, the 
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single RNA-strand has a tendency to fold back onto itself to form local double-helices of Watson- Crick base pairs 
(A-U and G-C) between complementary substrands of the base sequence. The entropy loss due to a bound helix is 
compensated for by the pairing energy due to the 2 (in A-U) or 3 (in C-G) hydrogen bonds of the base pairs and, 
more importantly, the stacking energy which is gained by the expulsion of water molecules between the hydrophobic 
parts of neighboring stacked base pairs. 

The set of all base-pairings in the RNA-molecule determines its secondary structure. The typical scale of pairing 
and stacking energies is considerably larger than the energy scale associated with the tertiary structure, i.e., the spatial 
arrangement of the RNA molecule (see and references therein). This separation of energy scales is at the basis of 
the usual paradigm to split the RNA-folding problem into the analysis of the base pairing pattern and a subsequent 
determination of the tertiary structure. The set of all pairing patterns considered as secondary structures is further 
restricted by discarding all pairings between different loops, so-called pseudoknots. Such structures lead to knotted 
configurations if the helices between the loops are sufficiently long to intertwine. While knots are prevented in nature 
by the linear transcription process from DNA to RNA, short helices between loops can occur in principle, but they 
are found to constitute only a minor fraction of all base pairings. They are thus considered as elements of the tertiary 
interactions that can be neglected when determining the secondary structure. If we number the bases in the sequence 
as % = 1, . . . , N according to their position in the strand, the above constraint can be formalized by forbidding the 
coexistence of two (ordered) base pairs (i±, {12, J2) in the secondary structure with either i\ < i% < j± < ]2 or 
«2 < h < h < h- 

As we mentioned above, the separation of energy scales breaks down for large molecules, and the folding problem is 
complicated by the highly non-local condition that the secondary structure must have a realization in 3D. However, 
for intermediate degrees of polymerization N the classical approach is expected to work well as is witnessed by the 
success of secondary structure prediction tools |j, ||] that are based on the above assumptions. 

In the following we will start from the usual paradigm and concentrate on the secondary structures excluding 
pseudoknots and other tertiary interactions. In section || we discuss the statistical properties of homogeneous RNA 
working directly in the abstract phase space of secondary structures. This allows us to take into account systematically 
some excluded volume effects that reduce significantly the available configuration space of interior loops, and to gain 
some insight into the thermal denaturation of RNA. The second part of the paper is devoted to the response of RNA 
to an external force. In section III the critical behavior at the force-induced opening transition is characterized and 
the effect of sequence disorder on the phase transition is discussed. Section IV deals with force-extension curves in 
the thermodynamic limit and its properties as a function of the energy parameters and temperature. We show that 
the phase transition is always of second order, but can be masked by a subsequent first order-like crossover when the 
cooperativity of the pairing behavior due to the stacking energy is high. 



II. FOLDING OF HOMOGENEOUS RNA 



In this section we neglect all effects due to sequence specificity. Instead we consider a RNA-model where any two 
bases can form a bond, their pairing affinity being independent of the bases. This exactly solvable model describes the 
physics of "homogeneous" RNA/ssDNA strands GCGCGC. . . or ATATAT. . . 0, renormalized on the level of dimers. 
We will provide some evidence that random base sequences are also well described by a homogeneous model, at least 
at sufficiently high temperatures, if one switches to a more coarse-grained description where the monomers of the 
model correspond to short subunits of the single strands rather than to real bases. 



A. The model 



Following the empiric rules established by Tinoco's group |l(J, [TTJ we consider three different terms in the free energy 
of a given secondary structure (cf. Fig. [l] for illustration of the notions): Each base pair contributes the pairing free 
energy f pa \ r that we normalize with respect to the completely denatured chain where all bases are unpaired. This takes 
into account the mean pairing free energy (bond enthalpy and entropy cost for localization) as well as the stacking 
energy with the neighboring pair. In this way we count an excess stacking energy at one end of the helices which we 
have to compensate for by a free energy cost —/stack- 
In the following these free energies will appear in the form of the temperature dependent parameters 

i] = exp(/3/ stac k) (1) 

and 



s = cxp(-/?/ pair ), 



(2) 
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closed structures 



FIG. 1: Elements of the secondary structure. The unpaired bases between the two closed structures constitute the free part 
(thick line) of the chain. The structure on the right contains a branched loop with I = 3 unpaired bases and m' = m + 1 = 4 
outgoing stems. The contour length L of the loop is taken to be the number of backbone elements in the loop, i.e., L — I + m' . 
Complementary substrands that directly fold back onto themselves form hairpins which end in terminal loops containing at 
least t bases. 

where j3 is the inverse of the temperature T. Under biological conditions (ionic strength and pH as in a living cell) 
•q <C 1, reflecting the importance of the stacking energy as compared to the binding energy of the hydrogen bonds. 
As we will see later, this is responsible for the high degree of cooperativity in the denaturation transition. 

The last contribution is an entropic cost for each closed internal loop which accounts for the reduced phase space 
available to the loop with respect to an unconstrained string containing the same number of bases. We assume this 
part of the cost function to depend only on the length L of the loop and the number m! of stems connected to it. The 
reduction of phase space gives rise to a free energy contribution of the form log[</)(L; m')]. Specific expressions for 
<j> will be discussed later. 

When describing homogeneous RNA, we should add two further constraints on the secondary structures to be 
considered: First, terminal loops (the loops at the end of a hairpin) have to contain a minimal number t of unpaired 
bases (t = 3 from experiments) since the bending rigidity of single-stranded RNA is finite and the typical distance of 
a hydrogen bond If, is about 3-4 times larger than the base distance I in the backbone. Secondly, stacks of less than 
three base pairs are unstable, and we thus require a helix to have a minimal length of n = 3 bases. 

These two conditions do not make sense for the description of real disordered base sequences on a coarse-grained 
level. The natural values to be taken in this case are t = and n = 1. 



B. The partition function 

We denote by Z C N the partition function of a closed RNA molecule with N bases whose ends are required to form 
a helix. We can easily obtain a recursion relation for Z C N (cf. Fig. |J): The closed secondary structure terminates in 
a helix containing k > n base pairs. It is followed by a first loop which contains I > unpaired bases and m > 
closed substructures, containing Li >2n + t bases, respectively. The arrangement of the free bases and substructures 
within the loop underlies no constraints and gives rise to a combinatoric factor. The loop contributes an entropic 
cost <j>(L; m! = m + 1) where we take the length L to be given by the number of backbone elements it contains, i.e., 
L = I + m + 1. Finally, the sum over all configurations can be decomposed as 

rn / l \ 1 m 

^=,E-*E E - E w+i+Xu-k) ( m :' , (l+ro+ 1 1;m+1) n^- (3) 

k>n {l,m}L 1 >2n+t L m >2n+t 1=1 v 7 ^ v ' ' »=1 

In the sum over / (number of unpaired bases) and m (number of outgoing stems) the following pairs have to be 
excluded: (m = 0, 1 < t) to prevent terminal loops smaller than t, and (m = 1,1 = 0) to avoid double counting of 
structures. 
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FIG. 2: Schematic representation of the recursion relation for the partition function of closed structures used to obtain 
Eq. (^) . The sum on the RHS is over the length k of the terminal helix, the number of unpaired bases I in the first loop and 
the number m and sizes Li (i = 1, . . . m) of closed structures attached to the loop. The shown configuration corresponds to 
k — 4, 1 = 7, m = 3. The loop contributes an entropic cost <fi(L; m'), where m = m + 1 is the number of outgoing stems and L 
is the contour length of the loop, L = I + m! . 



The structure of Eq. (g) suggests to study the generating function ^ C (C) = Y^N=2n+t ^nC of the partition function. 
Taking the discrete Laplace transform we obtain 



- (f\= (<2) " V 5 c (Q m C' (m + l\ ( ) 

" cUJ V l-sC 2 ^ 6(l + m+l;m + l) I m J ' W 

{m,Z} 

To proceed we have to make an assumption about the loop cost function m). Neglecting loop costs altogether 
corresponds to putting (f>(L; m) = 1 which yields 

(sC 2 ) n ( 1 I — /"* \ 

^-'r^d^RFc- 8 - 10 -!^?)- (5) 

We will use this simple model in section |fy] to discuss the general shape of the force-extension characteristics. We 
expect it to describe well the low temperature regime where large internal loops are negligible. In order to describe 
denaturation we should however use a more realistic loop cost function. 



C. Denaturation 



If RNA were an ideal chain without self-interaction, the entropic cost of a closed loop would just derive from the 
probability of a three-dimensional random walk to return to the origin, and thus 4>{L; m) cx L 3 / 2 for large values of L. 
This corresponds to the case discussed in 0, |[ where the authors start from real space recursion relations treating 
the single strands as ideal chains. If one considers the loops as self-avoiding walks, forgetting about the stems that 
are connected to them, one is lead to use (j)(L;m) cx L 3vsAW , with the wandering exponent vsaw = 0.588 (in 3d) 
characteristic of a self- avoiding walk. (This is the form used for large interior loops in the Zuker algorithm 
Clearly, this is too simple since the branches attached to the loop have a non-negligible effect on the conformational 
degrees of freedom of the loop and one should consider a more sophisticated form of the loop cost. 

A generalization of these entropy cost functions can be obtained from the results of Duplantier and coworkers 
|fl3[ |l4| for the configurational entropy of a network with given topology. In order to find the scaling of the effective 
entropy cost of an internal loop as a function of its size we consider the secondary structure as a tree-like network 
of helices, linked by internal loops. Let us single out one internal loop with m branches that we idealize as outgoing 
rods (see Fig. ||) . In the limit of loop sizes much smaller than the extension of the rods the scaling of the entropy 
cost for the internal loop follows from comparison of the expressions for the configurational entropy of a star-like 
network with m rays, r sta r(™), with that of a small loop with m attached branches ri oop (L; m). The latter scale as 
r s tar(m) ~ ]\T7star(m)-i and r loop (L; m) - N" n °°p( m) - 1 g{L/N) where N is the typical length scale of the attached rods 
(the remainder of the network), L is the length of the loop and g(x) is a scaling function. In the limit L/N — » the 
scaling of two expressions should coincide which requires g(x) ~ x 7loop ( m )-i't^{m) _ e ff ec ^j ve i 00 p cos |; function then 
follows as <j)(L;m) ~ Fi oop (L; m)/T s t &r {m) ~ a(m) ■ (L)"( m ^ where v(m) = 7 s tar(w) — 7i 00 p(™)- The renormalization 
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FIG. 3: The effective entropy cost for internal loops of length L can be obtained by idealizing the environment of the loop 
as m outgoing rods (here m = 5) and comparing the scaling expression for its configurational entropy with that of a star-like 
polymer with m rays. 



group results from 
of a vertex with k 



I3| , |l4| ] yield vjrri) = 3^saw — + "m where tJfe is the exponent related to the renormalization 
egs. (See also [l6|, for a completely analogous reasoning in the closely related problems 
of DNA denaturation and studies of 'slip- linked' polymers). While good estimates are available via e- expansion for 
small values of m jl3[ |l4| the thermodynamics of denaturation is essentially determined by the behavior of the cost 
for loops with many attached stems, i.e., by v[m) for large m, about which very little is known. We can proceed, 
however, without knowing an exact expression for v(m). Instead we will illustrate the general condition ()£]) below 
with a discussion of the ad hoc forms 4>{L; m) = a(m) ■ (L) y ( m ) with v{m) = v* = const., and v(m) = vq + mv\, the 
first one being a reasonable approximation for the case that v(m) saturates at v* for large values of v, the second one 
assuming that each branch contributes a further entropic constraint on the loop conformations, as suggested by the 
term ma^ above. The prefactor a(m) is assumed to be a moderate function of m that does not grow exponentially. 

The asymptotic behavior of the partition function Zfj can be derived from the generating function 2 C (C) without 
performing the full inverse Laplace transform. It is given by Z C N ~ (~ N /N a where C* is the smallest value of C at 
which S C (C) becomes non-analytic. The leading finite size corrections in the form of the pre-exponential factor 1/N a 
are determined by the nature of that non-analyticity. 

There are only two possible singularities for H C (C) : Taking successive derivatives of Eq.([|) one can check that all 
derivatives <i fe S c (£)/(i£ fc exist, unless the partial derivatives of both sides of Eq. (0) with respect to S c are equal, i.e., 



1 = rj 



« 2 )" 



E 

in J 



4>{l + ?7i + l;m+ 1) V m 



(6) 



In turn this condition is sufficient to ensure a non-analyticity of S c (£). Writing S c (£) = H C (C*) — S and expanding 
Eq. (|J) for small (C* — C) one finds S 2 ~ (C* — C)- Hence the singularity of S C (C) is approached as 

Sc(0 = S C (C*) - const. • (C* - C) 1/2 + 0(C, - C), (7) 

which leads to finite size corrections of the form Z C N ~ C^~ N /N 3 ^ 2 . This result is central to the discussion of the critical 
behavior at the force-induced denaturation transition. Note that the pre-exponential factor is essentially independent 
of the specific choice of the model, in particular, it does not depend on the shape of the loop cost function <j). 

The second possible singularity corresponds to the double sum on the right-hand side of Eq. (|j) being evaluated 
at its radius of convergence. To analyze this situation in more detail we suppose that the loop penalty assumes the 
form 4>{L; m) — a(m)L v ^ m \ According to Hadamard's formula, the radius of convergence is determined by 



lim? 



L 

E 



(s c (co/c*r 



^ a(m + !)(£+ l)"^ 1 ) 



-,1/L 



= 1. 



(8) 



In the case where v(m) grows at most sub-linearly in m, i.e., v{m)jra m z2^ a Q ; the sum can be estimated by its 
saddle point, and one finds the condition S C (C*) + = 1. In the same way, v(ra) = vq + v\m (with v\ < 1) leads to 
the condition = 1, but, since the sum diverges when £* — > 1, the singularity is ruled out in this case. 

The singularity given by Eq. (^|) is always the smallest one at low temperatures. The system undergoes a thermody- 
namic phase transition when the singularity determined by Eq. (|^) crosses the first one as a function of temperature. 
This can only occur if the first derivative with respect to S c of the double sum in Eq. (Q) stays finite on approaching 
the radius of convergence from below, that is, for S c ((*) +£»—►].. This requires lim m _ KX) f(rn) > 2. As we will show 
below the corresponding phase transition is associated to thermal denaturation. In all other cases our model does 
not exhibit a phase transition but only a crossover whose sharpness depends both on the loop cost function and the 
stacking as described by the parameter r]. 
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FIG. 4: The denaturation transition is best described by the typical distance d between bases within the secondary structure 
whose scaling changes from N 1 ^ 2 in the globular state at low temperatures to N in the necklace-like state above Td- (The 
dashed curve indicates d/N for a molecule with finite N. The transition becomes sharp only in the thermodynamic limit.). In 
contrast, the fraction of paired bases n v or the relative number of helices nu/N only exhibit a crossover at the denaturation 
temperature, but never drop to zero. 

D. Radius of gyration 

Let us now characterize the thermodynamic properties of RNA. Most observables can be obtained as appropriate 
derivatives of the free energy per base which is simply related to via / = ln(£*). For example, the fraction of 
paired bases is given by n p = 9ln(£*)/9ln(s) and the average number of helices is nh = N d rn(C*) / d In (77) . Evaluating 
these derivatives in the two possible phases one finds that n p and n^/N are both finite at all temperatures and thus 
do not provide a good order parameter for the phase transition (see Fig. |]). The (small) extensive number of pairings 
even in the high temperature phase is due to accidental pairings of bases that are close to each other within the linear 
RNA-strand. This result is independent of the details of the model such as the minimal number of bases in a hairpin 
loop, entropy costs and energy parameters. 

A better choice of observable is the average distance between two bases that belong to two different terminal loops 
of the secondary structure. This quantity is a measure for the diameter of the molecule and distinguishes the compact 
globular phase from the denatured loose phase. We define the distance between two bases as the length of the shortest 
path linking them through the secondary structure, see Fig. This path is a succession of loops and helices, its length 
being given as the sum of the lengths of the helices and half the contour lengths of the loops. 

Alternatively, we can consider again closed secondary structures constrained to terminate in a helix. It is easy to see 
that the same structural information as encoded by the distance between terminal bases is captured by the average 
distance of terminal bases from the closing base pair (1,7V). The latter quantity is more convenient to compute, 
however. 

Let us denote by n(d;S) the number of bases in terminal loops at distance d from the pair (1,7V). We define the 
Boltzmann weighted sum over secondary structures on a closed strand with N bases 

C N (d)=Y,Z c N (S)n(d;S), (9) 

for which we can write down a recursion relation in the same spirit as in Eq. (^|) : The sum over all secondary structures 
can be decomposed into a sum over the length k > n of the terminal helix starting at (1, N), and the first loop. The 
latter can either be a terminal loop containing N — 2k unpaired bases, or there can be further closed structures 
connected to it. In the second case, we single out the closed structure X whose terminal loops we want to consider, 
and denote by mi, l\ and m^, h the number of closed structures and unpaired bases to cither side of X in the loop. 
The mi + 777,2 closed structures just contribute the product of their partition functions, together with appropriate 
combinatorial factors for their arrangement within the loop, whereas the structure X contributes the Boltzmann sum 
Cl x (d') for a distance d! that is reduced with respect to d by the length k of the terminal helix and half of the loop 
contour m\ + l\ + 771,2 + h + 2. This results in the recursion 



FIG. 5: Definition of the distance d between bases in terminal loops: The unique shortest path through helices and loops from 
one terminal loop to another allows us to define d as the sum of the lengths of the helices and half of the contour lengths of the 
encountered loops, including the two terminal loops. For the bases in the figure one has d = 3/2 + 3 + 15/2 + 3 + 15/2 + 3 + 3/2. 



c N (d) = 



TjS 



k>n 



(N -2k)S(d- N/2-1/2) 
cj)(N - 2k + 1) 



E 



mi,Zi>0 m 2 ,l 2 >0 L x >2n+t 



E 

Li>2n+t; 
i— l,...,mi+m2 



rni+mj 

S(2k + h + h + U 



N) 



C Lx [d - k - (mi + h + m 2 + h + 2)/2] 
(f>(h +m 1 +l 2 + m 2 + 2) 



mi + h 
mi 



n z i 



m 2 + h 
m 2 



(10) 



Here we used the form (j)(L;m) = 4>{L) for simplicity. Passing to the Laplace transform with respect to both 
variables N and d, 



c(c,p)= E E c ^ 

N>2n+t d>n+t/2 



N e -pd 



(11) 



the equation is easily solved, 



C(C,P) = 



(12) 



We have introduced the functions (x) — J^N>t Nx N /<f>(N + 1) and g<f,(x) = J2n>i x n /(/>(N + 1). 

Note that C((,p = 0) has two possible singularities, the vanishing of the denominator and the singularity in g'^ 
that occurs when ( + S c (£) = 1. The first singularity is associated with the globular phase, the vanishing of the 
denominator being equivalent to condition (||) in the case <p(L;m) = 4>{L). The singularity related to g'Ax — > 1) 
governs the denatured phase. The denaturation transition occurs when the two singularities cross which can only 
happen if g'^{l) is finite. 

Let us now calculate the mean distance from bases in terminal loops to the free part. This is given by the logarithmic 
derivative of Cn{p) with respect to p 



(d) = d p C N ( P )\ p=Q /C N {p = 0) (13) 

which can be evaluated by inverse Laplace transform of d p C((,p)\ p= o and C((,p = 0) with respect to £• Note that 
<9 p Cjv(p)|p=o arL d Cn(p = 0) have the same leading exponential asymptotics since their smallest singularities are the 
same, but their finite size corrections (~ N /N a differ and will determine the scaling of (d) as a function of N. 
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Let us now analyze the case of a loop cost function with asymptotics <f>(L) oc L v and v > 2 which implies a phase 
transition. In the low temperature phase, the denominator in Eq. ( |l2|) vanishes like (£* — C) 1 ^ 2 as C approaches 
from below (cf. Eq. (Q)) which gives rise to a = 1/2 for the back-transform of C(£,f> = 0). On the other hand, the 
derivative dC((,p)/dp\ p =o diverges like (£* — Q^ 1 which yields a — 0, whence (d) ~ N 1 / 2 . 

The situation changes in the high temperature phase where the denominator remains finite but the generating 
function develops a (leading) singular part C s i n g(C>P = 0) ~ (C* — CY~ 2 giving rise to a = v — 1. Correspondingly, 
the singular part of the derivative behaves as (£* — C)"" 3 : which implies a = v — 2, and therefore (d) ~ AT. The high 
temperature phase is characterized by typical distances of order A~ from terminal bases down to the free part. This is 
what one expects for an essentially free, non-collapsed chain (but constrained to be paired at the ends (1, A r )). The 
secondary structure is rather trivial in this case, consisting essentially of one big loop with small structures attached 
to it. The low temperature scaling (d) ~ N 1 / 2 , however, indicates the collapse to a globular state with a rich branched 
secondary structure. 

We mention that the latter scaling can easily be derived in the absence of loop costs from the 'mountain height' 
representation of secondary structures |L8[ where the distance of terminal bases to the free part scales like the height 
of the mountain representation. This in turn is the typical excursion of a random walk of N steps in one dimension h, 
constrained by ft. > and h(l) — h(N) — which is known to scale as N 1 / 2 . There is, however, no simple equivalent 
of the above phase transition in the mountain height or random walk picture since the loop cost translates into an 
awkward non-local energy term. 

The distance d is a structural property of the tree-like skeleton of the secondary structure. In order to relate it 
to the real diameter of the molecule we assume that the helices and parts of internal loops connecting a terminal 
loop to the free part essentially realize a (constrained) random walk in space. If we assumed the random walk to be 
ideal, the radius of gyration of RNA-molecules would follow from the above findings as R g ~ d UWM ~ N 1 / 4 with the 
wandering exponent for ideal random walks i/rw — 1/2- This has already been obtained in Q. However, the random 
walks should at least be considered as self-avoiding, having a larger wandering exponent t'sAWi and correspondingly 
R g ~ 7V iy sAw/2 ^ j n addition, there are also constraints from the presence of the remainder of the molecule which 
have the tendency to increase the value of the wandering exponent. This effect is difficult to estimate, but we may 
obtain some idea about its importance by considering the two dimensional case: The wandering exponent for self- 
avoiding walks is known exactly as ^saw = 3/4 . On the other hand, we need to know how the Euclidean distance 
d en between two points on a branched polymer (a coarse grained version of the secondary structure) scales with the 
length of the shortest path between them, the so-called chemical distance d c h. The exponent !/ c h defined by d cn ~ d 1 ^ 1 
is exactly known for the special case of a space- filling branched polymer where it takes the value = 4/5 ^0|. It is 
to be expected that the exponent is slightly smaller in the case of a branched polymer of arbitrary density and thus 
almost equals fsAW- This suggests that the wandering behavior is only weakly affected by the presence of branches 
connected to a self-avoiding walk, and the approximation of v c h by vsaw is quite good. 

E. Discussion 

If we assume that also in the three dimensional case the presence of side branches does not increase substantially the 
wandering exponent from its value for the self-avoiding walk and suppose R g ~ ]\]~ u sav//2 we encounter a consistency 
problem in the thermodynamic limit: the monomer density in space diverges as N/ R? g ~ N 1 ~ 3l ' SAW I 2 in the collapsed 
phase. This problem is common to all models considered above, irrespective of the existence of a denaturation 
transition. (It does not occur if the side branches have a much stronger effect than in 2 dimensions and increase 
the wandering exponent beyond 2/3.) It reflects the fact that (local) entropy cost functions for interior loops are 
not sufficient to take into account global spatial constraints. The model customarily used in RNA prediction will 
thus be inconsistent for sufficiently large molecules in that it neglects excluded volume effects, deferring them to the 
subsequent analysis of the tertiary structure. This separation is however only justified as long as typically obtained 
secondary structures can easily be accommodated in space. 

In order to demonstrate that the standard RNA structure predicting programs (Zuker's mfold p^ |, Vienna package 
) are indeed limited by their neglect of excluded volume effects, we have used them to determine the folding of RNA 
sequences that were deliberately designed to form 'fractal' secondary structures (see Fig. ^). These were constructed 
starting from a short terminal helix that ends in a loop with two closed structures attached to it. Each of them again 
starts with a short helix that ends in a branched loop with two outgoing stems, and so fourth in a self-similar way. 
The density of such tree- or star-like structures grows exponentially with their radius. Even with a modest number of 
bases one can design sequences for which the structure prediction indeed yields the desired pairing pattern that can 
hardly be accommodated in space and should therefore be ruled out. 

While this is not a problem for small RNA molecules like transfer RNA or ribosomal RNA, the natural sizes of 
messenger RNA are on the order of several thousand bases which is likely in the regime where excluded volume 



9 




FIG. 6: Schematic 'fractal' secondary structure. The large circles represent internal loops, the rods symbolize helices and the 
small circles hairpin loops. By choosing various complementary G-C sequences on the helices and A's in the terminal loops, 
e.g., it is easy to design sequences that the structure prediction algorithms predict to fold as depicted. The resulting structure 
is extremely dense in real space and must be discarded as an admissible folding. 



effects play an important role for the folding and the usual structure prediction algorithms based solely on the base 
pairing pattern will likely fail. The inclusion of a effective loop costs to take into account the reduced available phase 
space might help to take into account those effects to some extent, but as we have seen they fail to cure the problem 
completely in the low temperature phase. 

At this point, it is worth mentioning that although neither loop costs nor the topological condition on the absence 
of pseudoknots are able to avoid (too) dense secondary structures, the situation would be much worse if no topological 
constraints were introduced in the model at all, that is, if all base pairings were allowed, irrespective of the resulting 
entanglement of the structure. It is rather obvious that a generic base pairing pattern obtained in such a model 
could not be accommodated in space: Let us consider a base pair and the two strands to which it belongs. If there 
is no constraint on the pairing behavior of the nearby bases within these strands, they can be paired to completely 
different parts of the chain which are then all forced into the same spatial region. The topological constraint forbidding 
pseudoknots weakens this tendency, since the substrand embraced by the given base pair is only allowed to interact 
with itself, which reduces largely the possibilities of spatial entanglement. 

The above observations lead to the conclusion that typical structures for large molecules are determined by a 
competition between favorable base pairings and the requirement that the resulting secondary structure can be 
accommodated in space. This will result in rather densely packed and entangled spatial arrangements of RNA that 
we expect to exhibit very slow dynamics and glassiness due to the inevitable spatial hindrance to pass from one 
favorable folded state to another. This is indeed observed in folding experiments on large ribozymes, where several 



misfolded states compete with the correctly folded native state 21, |22|. Thirumalai and Woodson |23|] propose a 



'kinetic partitioning mechanism' to describe this type of slow dynamics, according to which a fraction of all molecules 
fold directly to the ground state while the remaining molecules remain in metastable misfolded state until they find 
a pathway via the transition state ensemble to the native state. Glassiness may also arise purely on the level of 



secondary structure |L8|, 24, |2q, Eft 27, |28|, |29[ where topological constraints (in the form of backbone connectivity 
or constraints on pseudoknots) introduce a weak frustration in the system and establish a multiplicity of metastable 
valleys in phase space. For small molecules this has been shown to lead to slow dynamics [ |30| . In large molecules 
the jamming of the spatial arrangement of energetically favorable secondary structures probably plays an equally 
important role. It will be a major challenge to understand the interplay of secondary and tertiary structure and its 
relation with the closely related problem of protein folding. 



III. RESPONSE OF RNA TO AN EXTERNAL FORCE 



A. The partition function with force 



In this section we will extend our formalism to treat problems with an external force. In experiments, the RNA 
molecule is usually fixed on one end while the other end is manipulated by optical tweezers, magnetic beads in an 
inhomogeneous field, or the cantilever of an atomic force microscope. The extension of the molecule is monitored as 
a function of the applied force, or the position is imposed and the average force needed to maintain the position is 
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FIG. 7: RNA under an external force. The force pulls on the free part of the chain that can be subdivided into single 
stranded portions containing St unpaired bases (i = 0, . . . , m). Those are separated by closed structures, containing Li bases 
(i = 1, . . . , m). The free length derives from the terminating bonds of the closed structures (contributing C^if) to the partition 
function) and the backbone elements in the single-stranded parts (contributing Cas (/) m the approximation of uncorrelated 
monomers) . 



measured. Here we will concentrate on the situation where the force is fixed while the extension is subject to thermal 
fluctuations. 

To include the effect of an external force we have to add the term — F ■ (rjv — ri) to the energy of the system where 
fi denotes the spatial position of the i'th base. The force only acts on the free part of the chain, see Fig. g and we 
can rewrite the additional term as 



h-i 

- F ■ (f N - fi) = - J2 $ ■ - %))> ( 14 ) 

i=l 

where {b(i)} is the ordered list of all bases in the free part, and b(lb) — N. There are two types of contributions to 
the sum: Terms with b(i + 1) = b(i) + 1 correspond to successive bases in the backbone of the RNA while terms with 
b(i + 1) > b(i) + 1 correspond to the paired bases terminating the closed structure between bases b(i) and b(i + 1). 
For simplicity we consider both the distance / of bases within the backbone and the distance lb of covalently bonded 
bases as fixed. We treat the terminating hydrogen bonds of closed structures as free joints that are inserted between 
single strands of unpaired bases. They contribute a factor 1/Cb(/) = sinh(/3/4)/(/3/4) to the partition function. For 
the single strands in between we will restrict ourselves to the simple model of a freely jointed chain. If one wants 
to fit to experimental data ||, one should however consider a more realistic description involving correlations of the 
monomers on the scale of the persistence length (l p « 3/). At high forces, bond elasticities should also be included. 
Both modifications can straightforwardly be taken into account in the formalism below. 

The partition function zj^(f) including the external force can easily be obtained once the partition function Z < f f (f) 
of closed structures is known. We may decompose the sum over all structures into a sum over the number m and 
sizes Li (i = 1, . . . m) of closed structures in the free part, and the lengths Si, (i = 0,...m) of the single-stranded 
segments linking them: 



4(/) = E E E 6(s Q +Y,(L* + s t )-N)zg n 

m>0 S ,.--,S m >0 Li,...,L m >0 »=1 i=l 

Here we have introduced the partition function Z^(f) of a single-stranded segment with N backbone elements under 
the force /. The corresponding generating function follows from a discrete Laplace transform as 



Z L m Z S m (f) 



Cb(f) 



Cb(f) 



(15) 



The partition function is again found by inverting the Laplace transform, in particular, the free energy derives from 
the logarithm of the smallest singularity of S^(£; /). Apart from the singularities of 3 C (C) that we discussed in section 
[IC, 3f also has a pole singularity £(/) when the denominator in Eq. ( |l6| ) vanishes: 
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S e (C(/))S ss (C(/);/)-l 

1 = ^(7) on — • (16) 

Let us now fix the temperature. The singularity deriving from S c then takes the force-independent value C* = C*CO- 
A phase transition occurs at the critical force f c (T) where £(/) crosses (*(T). For larger forces, the force-extension 

characteristics follow from (L(f)/N) N = oc _ [3~ 1 d \n[C,{f)]/df in the thermodynamic limit. Here, L(f) denotes the 
projection of the end to end distance of the molecule onto the direction of the force. 

In the following we are interested in two aspects of the force-extension curve. First, we will examine in detail the 
critical behavior around f c . Later, we address the temperature dependence of f c {T) and an associated re-entrance 
phenomenon slightly below denaturation. Finally, in section |y| we will discuss in detail the dependence of the 
force-extension characteristics on the parameters s and r\. 



B. The critical behavior around f c 



In the following we treat the single stranded parts as freely jointed chains, whose generating function is given by 

S SS (C;/) = 1/(1-C/U/)), (17) 

where ( S s(f) = sinh(/3/Z)/(/3/?). Furthermore, we only consider temperatures below denaturation. 
In the vicinity of the (finite) critical force we may restrict ourselves to the relevant singularity structure of S(£; /) 
and expand the denominator in Eq. ( |l6|) to lowest non-trivial order around f c and £* . Using Eq. (0) we find 



S/(C;/)W (i-C/C*) 1 / 2 -^/-/,)' (18) 



or, on substituting £ = e s and C* = e s * , 

D 

(s - s*)V2— A (f - f c ) 



— q W 2_ —* -v (19) 



A and B are slowly varying functions of / and £ that we replace by their values at the critical point, A = A(/ c , £*) 
and B = B(f c ,£*). The (continuous) inverse Laplace transform of Eq. ( [l9| ) is explicitly known, and we obtain the 
partition function in the transition region as 



4(/) = -= i,{A{f - f c )N^), (20) 
VirN 

where ip( x ) = 1 + \pKx exp(x 2 )erfc(— x). The force-extension characteristics follow immediately as L(f) = 
yliV 1 / 2 (lnV')'[v4(/ - f c )N 1/2 ]. In the asymptotic regimes of the scaling variable x = A(f — f^N 1 ' 2 one obtains 
the expansions 



2A 2 (f-f c )N 



W) 



1 



1 

2.i- 



AN 1 ' 2 ^ 

-r— [1 - 

fc-f L 1 



0{x 2 )) 



X > 1, 

M < i, 

X < -1. 



(21) 



Sufficiently above the critical force (i> 1), the extension grows linearly with f — f c and scales as the system size. 
The chain organizes in a kind of necklace: the number of closed structures in the free chain is proportional to N, 
their average size being finite. In the low force regime (x <C 1) the chain is collapsed, but its extension diverges as 
l/(/ c — /) upon approaching the critical point. 

There are two critical exponents characterizing this phase transition: At the critical force (x -C 1) the extension 
obeys a power law L ~ N s with S = 1/2. The second exponent is related to the characteristic length scale in the 
problem, jV c oc (/ — f c )~ u where v = 2, as one can read off from the form of the scaling variable. Below we will see 
that N c can be understood as a correlation length. 
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C. Correlations and length scales at / > f c 

Slightly above the critical force, the typical number of bases in a closed structure is given by lt yp — N/n cs (f) 
where n CB (f) is the number of closed structures in the free part. It has the same critical behavior as the extension, 
i.e., n cs (f > f c ) cx N(f — f c ), and thus, l typ cx l/(/ — f c ). This is surprising since the characteristic length scale 
N c cx (/ — / c )~ 2 diverges much faster. 

To understand the meaning of N c let us introduce the indicator function rji which equals 1 if base i belongs to 
the free part, and otherwise. The correlation function (rjirjj) is simply obtained as the ratio between the partition 
function with bases i and j constrained to be free, and the total partition function, 



<W,> = ZLZ ^-^ Z --^ . (22) 

N 

Using Eq. ( p(i|) we obtain the connected correlation function as 

(mvitiMM = z U-2 z n 1 „ e -a-i-2)[A(/-jc)]' 

(ViKVj) ~Vi(i-j-2HA(/-/c)] 3 ' 1 } 

where the last approximation is valid in the scaling regime above the critical force for i, N — j,N 3> (j — i) 2> 
[A(f — / c )]~ 2 . The quantity N c — [A(f — / c )]~ 2 clearly appears as the correlation length beyond which the pairing 
behavior becomes essentially independent. To see why the correlation length is much larger than typical closed 
structures, let us look at the probability distribution of the sizes of the latter. 

Suppose that a closed structure starts at base i. The probability that it is paired to the base j = i + 1 + 1 is given 

by 

rf ycn>J 



pm _ Z i-i Z l Z N-(i+l+2) exp(-/A 2 (/ - f c f) 

[ ) ~ V 7 f Z c Z f I 3 / 2 ' 



from which we recover the expectation value for the structure size lt yv = (1) = J2t ^(0 ^ — /c)- On the other 
hand, we can calculate the fraction xQ*) of bases that belong to closed structures of size at least U: 



v~ /P m w f°° 1/2 — '( — w = erfc A ^ - /c Ji ■ ( 25 ) 

A finite fraction of all bases thus belong to structures of size 0((f — / c ) -2 ) which sets the scale of the correlation 
length N c . The vast majority of closed structures is much smaller, however. 



D. The critical behavior with sequence disorder 



After having understood the critical behavior in the homogeneous case, it is natural to ask whether disorder in the 
form of sequence inhomogeneities and varying pairing affinities between the bases is a relevant perturbation for the 
force-induced phase transition. In j3l], |3^] the authors studied the force-induced unzipping of DNA and found the 
presence of disorder to significantly alter the critical behavior with respect to that of a homogeneous double-strand. 
In RNA, the disorder effects are less pronounced since the two opening transitions are not really of the same nature. 
In DNA, essentially all base pairs are broken up at the transition and the double-strand becomes denatured. The 
force always acts only on the single base pair closing the yet unzipped double helix. In RNA, however, the transition 
occurs at a point where the entropy of large secondary structures and the free energy gain from the extension of the 
chain compete in a quite subtle manner, the base-pairing energies playing a less important role at the critical force. 
Furthermore, already in the critical region the force acts in parallel on a large number of globular structures aligned 
along the free part of the chain, which averages out the effect of disorder to some extent (3^, Q . 

In the case of RNA will be interested in the low-temperature regime where we can simplify the model by neglecting 
the loop cost function, i.e., putting <f> = 1. Furthermore, we replace the pairing and stacking free energies by simple 
(temperature-independent) pairing energies ey between the bases i and j. This does not change the critical behavior 
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FIG. 8: Scaling plot of force-extension curves for disordered models (power law distributions (PL) with a = 6 and a — 10, 
and the 4 letters model) at high temperature (T = 0.6). For better visibility, the extension of the 4 letters model has been 
multiplied by 1.5. 




FIG. 9: Force-extension curves from Fig. [8| and for homogeneous models at different temperatures T and pairing energies e. 
All curves superpose with the analytical prediction ( pi] ) L(f) = DN 1 ^ 2 \(C(f — /c)-/V 1//2 ) upon rescaling with model-dependent 
factors C, D. The error bars are smaller than the symbol sizes. 



at the force-transition in the homogeneous case (e^ = e), and we checked numerically that a disordered model with 
pure stacking energies leads qualitatively to the same results as the pairing model. 

As in earlier work |l^, [35| we consider different types of disordered models. The most natural one starts 
from RNA made of the four base species 6j £ {A, C, G, U}. The pairing energies will then depend on the sequence 
via eij = E(pi,bj) where E is a symmetric 4x4 matrix. We used the simple matrix E(C,G) = E(G,C) = —3, 
E(A,U) = E(U,A) = -2 (Watson-Crick pairs) and E(G,U) = E(U,G) = -1 (wobble pairs), and E = +oo for 
all other pairs. Alternatively we considered more abstract random coupling models where the are independent 
variables taken from a distribution P(e). In the following we focus on the two cases where P(e) is Gaussian or has 
power law tails decaying like |e| _a , respectively, both being centered on a negative value. 

The numerical evaluation of force-extension characteristics for these types of models is straightforward using the 
0(N 3 ) recursion relation as introduced in |]36| , ^| to compute the partition function Z^ N exactly for a given realization 

for a related investigation, and J34| for a more thorough discussion of the effects of 
disorder and the low temperature behavior). In Fig. || we show scaling plots of the disorder-averaged force-extension 
characteristics for several disordered models at temperatures well above the glass transition temperature |l^, p9[ . 
The data collapse in the critical regime was obtained optimizing the critical force in the scaling ansatz L{f) — 
N 1 / 2 ^^ — f^N 1 / 2 ) supposing that the critical exponents are the same as in the homogeneous case {5 = \ jv = 1/2). 



of the disorder (see Refs. 1 33 
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FIG. 10: Scaling plot of force-extension curves for the Gaussian model at T=0. The critical exponent is modified by the 
disorder to 5 ~ 0.7. 



The scaling works well for the model with four letters and for the random coupling models with a Gaussian probability 
distribution, or with P(e) ~ e~ a and a > 4. However, the data obtained for distributions P(e) ~ e~" with a < 4 
(not shown) cannot be collapsed satisfactorily even when allowing for different critical exponents. This is due to the 
dominance of some rare but very strong couplings as will be explained in the next subsection. 

By rescaling the axes of the plots with model-dependent metric factors C and D, L(f) = DN 1 / 2 \(C(f — f^N 1 ^ 2 ), 
one can perfectly superpose the scaling functions for the different models with that of the homogeneous model, as 
shown in Fig. ^|. This indicates that, above the glass transition temperature, disorder is irrelevant for the force-induced 
phase transition in the sense that all sufficiently short-ranged disordered models fall into the same universality class 
as homogeneous RNA. The latter suggests that the behavior of disordered RNA at high temperatures is well captured 
by a coarse-grained homogeneous description with renormalized parameters. The effect of disorder is washed out by 
thermal fluctuations which allow for a large number of secondary structures to be explored so that the large entropy 
of secondary structures (with approximately the same number of bonds) dominates the physics. 

The situation is different at low temperatures where the molecule is restricted to a small number of favorable 
foldings. The disorder-averaged force-extension curves at zero temperature are shown in Fig. |l0| where the data 
collapse has been achieved with the general ansatz L(f) = N s A((/ — f^N 1 '"), optimizing for f c , S and v. As we will 
discuss in the next paragraph, the correlation length exponent v — 2 stays unchanged with respect to the homogeneous 
case. However, the exponent 8 is modified (5 — 0.7) |34[] . 



E. Harris-type criterion for the relevance of disorder 



The relevance of disorder for a phase transition can often be judged by applying Harris' criterion according to which 
disorder is relevant if the specific heat exponent a — dv — 2 is positive. Plugging in d = 1, since the sequence of bases 
is one-dimensional, and using the correlation length exponent v = 2, we are lead to conclude that disorder is marginal 
for the force transition. 

In order to derive this result, we start from the homogeneous model where the pairing behavior is correlated up to 
the scale £ ~ \f — f c \~ 2 ■ The introduction of disorder will locally modify the value of the critical force, f c — > f c + A/ c 
whereby this only makes sense as long as one considers substrands that are large compared to the bare correlation 
length. To estimate the typical fluctuations A/ c (£) for regions of size £ we observe that the opening transition is 
mainly of entropic nature (see section [V) and results from the competition between the gain in stretching energy 
and the decrease of the number of possible secondary structures when the chain changes from a globular state to a 
necklace with a larger number of closed substructures. 
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The effect of the pairing energies comes only as a perturbation. It is thus reasonable to expect that A/ c (£) scales 
like the fluctuations of the average binding energy per base for secondary structures that are restricted to a substrand 
of length £. This implies A/ c (£) - £~ 1/2 - 

Locally, the correlation length is modified according to 

£~ |/-/c-A/ c (0|" 2 . (26) 

The scaling A/ c (£) ~ ^ 1 / 2 ^ (J — J c ) corresponds just to the limiting case for which the critical force is still uniquely 
defined and the exponent v = 2 remains unchanged. This reflects the marginality as predicted by the vanishing of 
dv- 2. 

The above considerations are wrong if the disorder distribution has large tails in which case very large though 
rare couplings may dominate the secondary structure pattern. For the random coupling model with power law tails 
P(e) ~ |e| _Q we can easily find a lower bound on a below which disorder will significantly alter the behavior of 
the model, rendering it even non-self-averaging. The energy fluctuations will scale like the energy of rare favorable 
secondary structures. We may estimate the latter by considering a "greedy" algorithm that constructs a secondary 
structure by choosing iteratively the base pair with the most negative energy available while respecting the topological 
constraints imposed by the pairs already chosen. There are N(N—V)/2 pairing energies available for the first step, and 
the best among them will scale as N 2 / a . There will be of the order of log(iV) further choices that lead to comparable 
energies, while in later stages the pairing energies will be significantly smaller. We thus expect the disorder induced 
energy (fluctuations) to scale at least as AE(N) ~ N 2 / a log(iV), or Af c (N) ~ 7V 2 /"-i log(iV). Thus, the fluctuations 
dominate for a < 4. Indeed, we did not succeed in collapsing the numerical data for a — 3,4. 

F. Why is the force induced transition of the second order? 

It is rather unusual to find a continuous phase transition in force-extension experiments. The closely related 
globule-coil transition in polymers is of the first order which is a consequence of the large finite size corrections to 
the free energy of the globular phase of a chain with N elements, F g i(N) = f g iN + aN 2 ^ 3 . The term aN 2 / 3 takes 
into account solvent effects at the surface. Such corrections are essentially absent in the (extensive) random coil 
phase since all monomers are more or less in a similar environment; but the free energy depends on the external 
force / since the structure is extensible, F co n(N) = Nf co n(f). At the force where the extensive parts of the free 
energy become equal, f g i = f co a(f), a discontinuous transition from the globular to the random coil phase takes place 
pSl |39j. Mathematically, the first order nature of the force transition is reflected by the an essential rather than 
algebraic singularity in the Laplace transform of the partition function in the globular phase. 

In our model for RNA a surface term in the globular phase is absent since solvation energies and surface effects are 
part of the tertiary interactions that are far less important than the base pairing (at least for small and intermediate 
sizes). The finite size corrections of the free energy in the globular phase are only of order log (AT). Thus, a subdivision 
of the chain into a necklace of globules is less costly than in the presence of surface effects. At the critical force, this 
leads to a continuous crossover from a single large globule to a necklace containing an extensive number of smaller 
globules which takes place over a force window decreasing as N~ 1 ^ 2 . In a two-dimensional homogeneous model of the 
globule-coil transition jl0|], the authors found the force-transition to be continuous, too, which can be traced back to 
the absence of surface energies that grow polynomially with the system size. 

It is worth mentioning that the thermodynamic phase transition would be absent in our model if the secondary 
structures were not allowed to contain multi-branched loops, but were limited to single hairpins (with possible align- 
ment gaps). Instead, there would only be an opening crossover. Although the continuous phase transition is an 
otherwise robust feature of all models irrespective of the details of the pairing and stacking rules, it critically depends 
on the topological constraints. 

IV. DISCUSSION OF FORCE-EXTENSION CURVES IN THE THERMODYNAMIC LIMIT 

In a recent paper || the authors claim that the inclusion of large stacking energies in the model renders the force 
transition first order in contrast to the second order transition found in a model with only pairing energies. This has 
resulted from an erroneous analysis of a system of equations for generating functions that are real space analogs of 
our Eqs. (|l^) and (0). The authors were mislead by a sharp force-induced denaturation crossover that masks the 
true thermodynamic transition at a smaller force where the extension begins to grow only very slowly as a function 
of force. 

Before we discuss the general properties of force-extension curves, let us recall the parameters entering the model: 
Helices are required to contain at least n base pairs, and terminal loops closing a hairpin consist of at least t unpaired 
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bases. The distance between monomers in the backbone is denoted by I, while lb is the average distance between 
covalently bound bases. If we describe homogeneous RNA, we should use the empirically determined values lb « 4Z, 
n = 3 for the minimal helix length and t = 3 for the minimal hairpin loop. However, if the model is used to describe 
disordered RNA on a coarse-grained level, natural values are lb <C /, n = 1 and t = 0. 

The base pair interactions are described by two parameters, s(T) and i](T), accounting for the pairing and stacking 
energy per base pair, and the cost for the initiation of a helix, respectively. As mentioned earlier, under physiological 
salt conditions the cooperativity parameter r](T) is very small and thus favors the formation of long helices. The 
parameter s(T) = exp(— /3/ pa ir) is large at sufficiently low temperatures but approaches s « 1 in the denaturation 
regime. In the following discussion we will consider ry(T) to be small throughout and s(T) to be large at low 
temperatures, while approaching 1 around the denaturation temperature T^. We distinguish the three temperature 
regimes, dropping the explicit temperature dependence of s and 77: s ^> 1, l>s - 1> 77 1 / 3 and < s — 1 -C 77 1 / 3 . 
The case s < 1 corresponds to denatured RNA which is not of interest for force-extension studies. 

For analytical simplicity we use the model without loop cost function, <j> = 1, see Eq. (^). This is expected to be 
justified in the low temperature regime s 3> 1, as well as at high forces, while the results for s sw 1 at low force have 
to be taken with some care. 

For the details of the calculations the reader should refer to the appendix. 



A. Critical force and re-entrance 



In order to discuss the thermodynamic limit of force-extension curves, we need the free energy per base, </>(/), as a 
function of the force. It follows via ((f) = exp[— (3<f>(f)] from Eq. ( [l6| ) that we rewrite as 

U/)=C(/) + =ttP- (27) 

Here we treat the single strands linking the closed structures as freely jointed chains, whose free energy per base is 
related to Css(/) via Css(/) = ex p[ — P&ssif)]- The free energy per base in the globular phase, 0*, is determined from 
the singularity = exp(—/3<p t ) of 3 C . 

The chain begins to open when both free energies are equal, i.e., when <^>* = <p(f), and the critical force is determined 
by the equation = C(/c)- 

In the low temperature regime (s 3> 1), the critical force depends on the parameter t for the minimal length of 
terminal loops. For t > 1 we find 



/^-i^l/p-^l- (28) 

The dependence on t is due to the fact that each hairpin terminates in a loop with at least t unpaired bases. 
The corresponding loss in energy is very important at low temperatures and limits the thermally accessible phase 
space to rather elongated hairpinned structures with few branchings. This manifests itself in the critical force being 
proportional to the pairing free energy, almost as in the unzipping of DNA. 

For t = the situation is different since there is no energy cost associated to a hairpin, and therefore the available 
phase space of secondary structures is still large, even at low temperatures. The phase transition is governed by the 
competition between the force, trying to increase the number of closed structures in the free part, and the entropy 
that favors one big closed structure. The equation for the critical force reduces to 

XUfMfc) « 1 (29) 

and thus f c (T) cx T/l, almost independently of s. This reflects the purely entropic origin of the critical force sufficiently 
below denaturation . 

Clearly, t — corresponds to an unphysical situation if the monomers in our model are interpreted as nucleotides. 
However, if we regard the homogeneous model as a coarse-grained description of a disordered base sequence, the 
monomers in the model stand for short substrands with an average affinity to pair with other substrands. The 
frustration in the secondary structure of disordered RNA necessarily leads to gaps in the base pairing that are usually 
larger than the minimal length of terminal loops. It is therefore unnecessary to impose a constraint on the terminal 
loops, i.e., we may safely put t = in this case. At the same time, the minimal length n of helices and the length 
parameters I and lb have to renormalized appropriately, as we indicated earlier. 

In the denaturation regime, skI, the critical force becomes small. Independently of t, it decreases as 
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FIG. 11: Phase diagram of disordered RNA as a function of temperature and the force. The molecule undergoes a continuous 
opening transition at a critical force fc(T) that we predict to be non-monotonous as a function of temperature. This gives rise to 
a re-entrance phenomenon at fixed forces in a certain interval. At low temperature the system is in a glassy phase characterized 
by a small number of low-lying metastable states. At higher temperatures, RNA is in a molten state, that behaves in essentially 
the same way as a homopolymer. There is a thermal denaturation transition if the entropic penalties for loops are sufficiently 
large. Otherwise there is simply a crossover, and fc(T) never really vanishes. 



, 0((s - l) 1 ' 2 ) 1 > a - 1 > V 1/3 r - m 



on approaching denaturation. 

For t = an interesting re-entrance phenomenon occurs: We have seen that in this case f c (T) is an increasing 
function of temperature at sufficiently low temperatures, which is due to the entropic nature of the critical point. The 
large value of the binding parameter s merely forces the dominant secondary structure to have all bases paired, but 
does not influence the critical behavior otherwise. This picture does, however, not apply near denaturation where the 
base pairs are only loosely bound. Rather, Eq. (|30| ) shows that f c decreases essentially to zero in the denaturation 
regime. This is a consequence of the vanishing of the pair binding free energy (oc s — 1) which competes with the free 
energy gained from stretching. The latter grows as f 2 at low forces, and thus, the phase transition takes place on a 
force scale of the order of (s — l) 1 / 2 which vanishes at the denaturation transition. 

The different behavior at low and high temperatures implies that the critical force reaches a maximum somewhere 
below the denaturation temperature which gives rise to the following re-entrance phenomenon (see Fig. |TT[ ) : If one 
fixes the external force at a value smaller than the maximum of f c (T) and then decreases the temperature, starting 
in the denatured regime, the molecule will collapse to the globular state when it crosses the critical line for the first 
time. However, it will re-enter the stretched phase again at lower temperature when the second crossing occurs. Since 
the transitions are of second order this behavior would be seen as a (continuous) breathing of the molecule upon 
changing the temperature. We expect this effect to be relevant for disordered RNA-sequences where t — applies as 
we argued above. Such a behavior has been seen in numerical simulations of protein unfolding A similar effect 
was also predicted in the form of 'cold denaturation' in DNA unzipping G2l Eil. 



B. The opening crossover above f c 



In the following, we discuss several features of the force-extension curves in the thermodynamic limit, in particular 
we will derive how the linear slope above the critical point and the characteristics of the subsequent crossover depend 
on the energy parameters of the model. The results are illustrated in Figs. [l^,[l3| and |l4] where we plot force-extension 
curves for various pairs of s and 77. The structural parameters are those appropriate for homogeneous RNA (n = 3, 
t = 3, and li, = 41). 

As we discussed in section [II, the extension L(f) of the molecule slightly above the critical force grows like N(f—f c )- 
The prefactor and the range of validity of the linear regime can be calculated from an expansion of Eq. (]27n around 
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FIG. 12: / — Z-curves at large cooperativity: The initial linear slope at the transition is proportional to r\. The phase transition 
is masked by the subsequent crossover whose steepness scales as ?7 _1 ^ 2 , independently of a. 

UN 




FIG. 13: / — Z-curves at intermediate cooperativity: Critical point and crossover merge for sufficiently small values of s. 



the critical point. In the different temperature regimes, we find 



O (rjs 



-t/4\ 



S > 1, 



L(f)/(N(f-f c ))~ I 0(r)/(s -l) 4 ) l» S -l»r, 1 /3 j 



0(rr 1/3 ) 



s - K rj 



1/3 



(31) 



valid within a force window 



0(1) s»l, 
f-fc={ 0(a-l) l» S -l»r, 1 /3 j 



(32) 



Note that at low temperatures (s ^> 1) the extension is suppressed by a factor r\ (see Fig. |l2|). This is a consequence 
of the cooperativity in the system, i.e., the tendency to form long helices and large structures. Their size increases 
typically as so that necklaces composed of such structures are very short. The effect is even enhanced for t > 
where it is favorable to form longer helices in order to reduce the energy loss in terminal loops. The extension grows 
linearly with the force over a range of order O(l) which is comparable to the critical force itself. This is in contrast 
to the case near denaturation where the extent of the linear regime is much smaller than f c and the coefficient of the 
linear term can be appreciably large. 

At higher forces the force-extension characteristics becomes non-linear, and finally, the majority of base pairings 
opens up in a sharp crossover. This happens at a force /* where the free energy of bases in the free part of the chain 
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FIG. 14: / — Z-curves with no cooperativity: First the large closed structures open up to form smaller units. The plateau 
corresponds to necklaces of hairpins of the smallest possible size. Those structures are disrupted only at a higher force. This 
behavior is an artifact of very strong pairing energies. 



equals half of the base pairing free energy, Css(/*) ~ s -1 / 2 . Note that this is the analog of the critical force in DNA 
unzipping which is of a different nature than f c . Beyond /*, the molecule behaves essentially as a freely jointed chain. 

The crossover is most interesting at low temperatures, s» 1. In the appendix it is shown to take place within a 
force window determined by p(f ) = [r?s- (1+ * )/2 /C&(/)] 1/2 /ICss(/) - s~ 1/2 | = 0(1). Its width scales like 

a/ ~ (y+(w*-*-D/2) 1/2 M M V2 (33) 

where we assumed for homogeneous RNA. This expresses the fact that the smallest possible hairpin loop 

(with length I ■ (t + 1)) will almost be a direct, stretched bridge between the bases of the adjacent pair (at distance 
lb). The crossover is very sharp if the cooperativity, as measured by 77 -1 , is high. 

Below the crossover (p(f) <C 1) we find the extension to be small as compared to that of a freely jointed chain 
subject to the same force, 

^ - ^f^pif). (34) 

Above the crossover (p(f) ^> 1), the role played by closed structures is negligible, and the force-extension curve 
approaches that of a freely jointed chain, 



L(f) _ L FJC (f) 
N ~ N 



1 + 



P(f) 



(35) 



Near denaturation (s w 1), the situation is less interesting since both the phase transition and the final opening of 
the secondary structures take place at very small forces. Still, there is a discernible crossover in the case s-l> r/ 1 ^ 3 
that occurs at a force /* ~ / c + 0(s — 1) while its width scales like A/ ~ [r)/{s — l)] 1 ^ 2 - In the limit s — 1 <C rj 1 / 3 , 
however, no crossover can be seen since all characteristic force scales behave as 77 1 / 3 . 



V. CONCLUSION 



We have studied several aspects of RNA folding on the level of the secondary structure. We concentrated on a 
homogeneous model that we expect to describe random sequences on a coarse-grained level as well, provided that the 
parameters are appropriately renormalized. This point of view is strongly supported by the finding that, at sufficiently 
high temperatures, disorder is an irrelevant perturbation for the force- induced opening transition in the sense that 
scaling functions for disordered sequences superpose perfectly with the analytical curve calculated in the homogeneous 
case. 

Our model takes into account base pairing and stacking energies, as well as entropic costs for loops. The latter have 
been shown to give rise to a thermodynamic denaturation transition if they are sufficiently strong. However, even 



20 



when loop penalties are included, the dominant secondary structures in the collapsed phase of large molecules are too 
dense to be accommodated in three-dimensional space. Even at moderate, biologically relevant sizes, it is questionable 
whether the usual neglect of excluded volume (and other tertiary) interactions in the prediction of secondary structure 
is justified. 

The force-induced unfolding has been studied in detail. We have characterized the second order phase transition 
separating a globular from a necklace-like extensive phase. A correlation length diverging like (/ — f c )~ 2 has been 
identified as the typical size of the largest closed structures that appear in the necklace above f c . The critical scaling 
of the correlation length remains (marginally) unchanged upon introduction of disorder, as follows from a Harris-type 
criterion. However, in the low temperature, glassy phase the other critical exponents are modified, in contrast to the 
high temperature phase which belongs to the same universality class as a homogeneous polymer. This difference of 
the temperature regimes manifests itself as well in the force-extension characteristics. In the present paper, we have 
restricted ourselves to the discussion of the homogeneous or high temperature case. The jump-like force-extension 
curves in the glassy regime are discussed in . 

The second order phase transition in force-induced unfolding has been shown to be a very robust feature of the 
model. In particular, it is independent of the specific pairing and stacking energies and further structural parameters. 
It occurs at a critical force that we predict to be a non-monotonic function of temperature in the case of disordered 
sequences. This gives rise to a re-entrance phenomenon when the temperature is varied at constant force. 

When the stacking energy is large, the pairing behavior is highly cooperative. This can render the second order 
phase transition almost invisible, since the extension grows very slowly as a function of / — f c . On the other hand, a 
very sharp first-order-like crossover occurs at the (higher) force where essentially all base pairs are disrupted. 
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Appendix 

In this appendix we outline in more detail how to obtain the characteristics of the force-extension curves of section 
[V. 



The singularities of the partition function 

Let us start from Eq. ( p7y 

C-(/) = C(/) + ^7tP, (36) 

which determines the pole singularity C(/) as a function of the force for / > f c . The functional form of the generating 
function 2 C (£) for closed structures can be obtained from Eq. (||) as 



CT ,->\ _ i r ^sffli-o/q-cn if ^c 2 )"(i-c f )/(i-o y r ? « 2 )"c« 

" clW 2^ * 1 - < 2 + r](sC 2 ) n J V 4 V l-s^+^C 2 )™ J 1 - sC 2 + v(sC 2 ) n 

The free energy per base of the globular pha se is related to the singularity of S c via = exp[— /3<f) c ]. C* is given 
by the vanishing of the square root in Eq. (p7|). In the low temperature regime, s > 1, we find 




s -l/2 (1 - f ) t = 0, 

- 1/2 (1-I^) * = 1, 

- 1/2 (!-^) < = 2 > 

- 1/2 (1-Fk) *>3, 



(38) 
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while near the denaturation, s - 1 < 1, we obtain 



CJ^O-T^F) 1»«-W 3 . (39) 
{ 1- (27?) 1 / 3 s- 1 <? ? 1 / 3 . 



We recall that we assume ij <§C 1 throughout. 



The critical force 



The critical force f c has to be determined from the crossing of the two singularities, C(/ c ) = C*: or more explicitly, 

C 8S (/c)=C*+2 c (C*)/Cb(/c), (40) 
where we can use the fact that the square root in (B7h vanishes at £* , 



" c ^- y l_ sC 2 + 77 ( sC 2)n- 

In the low temperature regime, s > 1, we have to distinguish different values of t. In the case t > 1 we find 
approximately 

We can neglect the first term on the right hand side for t < 2(1^/1 + 1) which is always satisfied for homogeneous 
RNA where we have t w l^/l — 1. 

Recalling the definition s = exp(/3|/ pair (T)|) and approximating Css(/) = 01 f / sinh(/?Z /) w exp(— 01 f) and C&(/) = 
Plbf/ sinh(/%/) ~ exp(— /%/) at low temperatures, we find 



4(4 + ' 



Jc(T)« 777 -^ T |j pair (T)|. (43) 



The behavior for t = is different since all bases can be paired, even those at the end of a hairpin. In that case, 
( [40| ) reduces to 

2U/c)a(/c) « 1, (44) 

implying / C (T) cx T/Z, almost independently of s. This underlines the purely entropic origin of the critical force 
sufficiently below denaturation . 

In the regime s w 1 the critical force is small, decreasing as 

0((s-l)^ 2 ) l>s-l>r? 1 /3 
O^ 1 / 6 ) s - 1< 77 



on approaching the denaturation. This follows simply from an expansion of (40) at low forces 



The linear regime above f c 

In the thermodynamic limit the force-extension curve starts off linearly from zero extension at f c . To obtain the 
slope of the curve, we expand Eq. ( |36| ) around the critical point. In particular, we have to expand the square root in 
Eq. © at C*, 

2\n 



22 



As long as we can neglect BJC* — C) 2 with respect to A((* — £), i.e., for C* — C(/) ^ A/B, we may obtain £(/) 
approximately from (see Eq. (pq)) 

C»(/)Ci(/)l} « (/ - /cJK-sCd'C/c) » s c (c)|^ /} « -[A(c - C(/))] 1/2 , (47) 

and thus, 

The extension in the linear scaling regime above f c then follows via L(f) = — N[3~ 1 dln[£(f)]/df ~ iV(/ — / c )- In the 
different temperature regimes, the coefficient of N(f ~ f c ) scales like 

f O (rjs- 1 / 4 ) s > 1, 
L(f)/(N(f - f c )) ~ J O (W(s - l) 4 ) 1 » a - 1 » r, 1 / 3 , (49) 
[ 0(?r 1/3 ) s-K 77 1 / 3 . 

The linear regime extends up to forces determined by — £(/) ~ A/B, from which we obtain the force windows 

0(1) s»l, 

f-fc={ O(s-l) 1 » s - 1 » ryVa, (50) 
0(^/3) s-Kr? 1 / 3 . 



The non-linear regime and the crossover 



At higher forces the force-extension curve becomes non-linear, but the typical size of closed structures is still large 
so that the second term on the right-hand side of ( |36| ) cannot be neglected. Only in a later stage the base pairs open 
up completely, and the molecule becomes a freely jointed chain. 

In the further discussion of the characteristics of the force-extension curve we will restrict ourselves to (real) 
homogeneous RNA with l<f« h/l — 1. We will only treat the case s ^> 1 in some detail, the case s w 1 can be 
treated analogously. 

Beyond the linear regime, we can solve for C(/) using the following approximation: The generating function for 
closed structures can be simplified by expanding the square root in Eq. ( |37j ) with respect to the second term 

Sc(C (/)) « ™ (51) 

(i-C)[i-< 2 +^C 2 )™]-^« 2 )"^ 

The ( p7| ) for £(/) then reduces to a quadratic equation up to terms of the order of ^ff\, except in a narrow region 
around the force /* where the opening crossover takes places. The crossover force /* is approximately determined by 

Css(/*) — s ^ ■ 

Before proceeding let us estimate the width of the crossover. Considering the right hand side of Eq. (36), we note 
that the closed structures are important as long as the variation of 3 C (£(/)) /£;,(/) with force dominates that of £(/)• 
The crossover to the freely jointed chain regime occurs when both variations become comparable. Let us therefore 
write £(/) = s _1 ' 2 (l — and determine the value ex at which the correction s~ x l 2 ex begins to dominate the 

term S c (£(/))/£b(/). Using Eq. ( p3l| ) and the fact that f]/ex will be small, we find S c « s^ t ^ 2 r)/ex- Equating this to 
s- 1/2 e x Cb(f), we find 

a-V*e x = s- 1 ' 2 (if) ~ hs- (1+t)/2 Mf)] 1/2 . (52) 
More quantitatively, one finds that for forces such that ( S s{f) — s~ x l 2 ^> [r]s~^ 1+t ^ 2 / (b(f)} 1 ^ 2 , C(f) is given by 

af) " 8-1/2 k-mmth**))- (53) 

The extension follows from a logarithmic derivative, 

Hf) £fjc ?7^ t/2 C ss (/) 



N ' N tb(f)(Uf)-s-V 2 ) 2: 



(54) 
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which is small as compared to that of a freely jointed chain at the same force. 

For forces such that Css(/) — s -1 / 2 <C — [rjs~( 1+t ^ 2 / ^{f)] 1 ! 2 closed structures play a negligible role. To the same 
degree of approximation as before one finds 



C(/) « Css(/) 



1 + 



-(l+*)/2 



and the force-extension curve joins that of a freely jointed chain 



Hf) 



J FJC 



(/) 



1 + 



"(1+0/2 



Cfc(/)(C»(/) 



JV TV 

The force window over which the crossover takes place derives from (|52j) and scales like 

1/2 



A/ 



?/.s 



l + (l b /l-t-l)/2 



(55) 



(56) 



(57) 



For the case near denaturation, s 1 the calculations are analogous and yield a crossover around /* w / c + 0(s — 1) 
with a width scaling like A/ ~ [f]/(s — l)] 1 ^ 2 for s-l> 77 1 / 3 . In the limit s — 1 <C however, all characteristic 
force scales behave as 77 1 / 3 , and a separation of into different regimes does not make sense. 
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